Resources for Information Extraction from Polish texts
نویسندگان
چکیده
The paper presents a collection of resources developed for Information Extraction (IE) from Polish texts. In particular, we mention two IE platforms adapted to Polish and several IE applications built on top of one of them: named entity recognition, creation of terminology lexicons, and data extraction from medical texts.
منابع مشابه
Terminology extraction from medical texts in Polish
BACKGROUND Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need informat...
متن کاملAutomatic Processing of Diabetic Patients' Hospital Documentation
The paper presents a rule-based information extraction (IE) system for Polish medical texts. We select the most important information from diabetic patients’ records. Most data being processed are free-form texts, only a part is in table form. The work has three goals: to test classical IE methods on texts in Polish, to create relational database containing the extracted data, and to prepare an...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملDependency-based Extraction of Entity-relationship Triples from Polish Open-domain Texts
We present a prototype system for extracting arbitrary relations between named entities from open-domain texts in Polish based on DEBORA – a dependency-based approach to the problem. The presented method is designed for the purpose of the conducted experiment and is adapted to morpho-syntactic properties of Polish, e.g. free word order, high degree of morphological marking. Our preliminary resu...
متن کاملDEBORA: Dependency-Based Method for Extracting Entity-Relationship Triples from Open-Domain Texts in Polish
We present DEBORA – a dependency-based approach to the problem of extraction of arbitrary relations between named entities from open-domain texts in Polish. The presented method is designed for the purpose of the conducted experiment and is adapted to morpho-syntactic properties of Polish, e.g. free word order, high degree of morphological marking. Our preliminary results show that the method i...
متن کامل